Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize reflection of F# types, part 2 #9784

Merged
merged 2 commits into from
Jul 25, 2020
Merged

Optimize reflection of F# types, part 2 #9784

merged 2 commits into from
Jul 25, 2020

Conversation

kerams
Copy link
Contributor

@kerams kerams commented Jul 25, 2020

Continuation of #9714.

  • PreComputeRecordConstructor
  • PreComputeRecordFieldReader
  • PreComputeRecordReader - reworked from last time
  • PreComputeUnionConstructor
  • PreComputeUnionReader
  • PreComputeUnionTagReader

PreComputeTupleReader and PreComputeTupleConstructor are a little more complicated, so perhaps another time.

The rest of the PreCompute family only return corresponding System.Reflection.MethodBase for further reflection use.

Type Method Mean Error StdDev Gen 0 Gen 1 Gen 2 Allocated
PreComputeRecordConstructor Reflection 554.210 ns 3.3056 ns 3.0921 ns 0.0134 - - 112 B
PreComputeRecordConstructor Compiled 9.999 ns 0.0416 ns 0.0389 ns 0.0057 - - 48 B
PreComputeRecordFieldReader Reflection 112.941 ns 1.9830 ns 1.8549 ns 0.0029 - - 24 B
PreComputeRecordFieldReader Compiled 3.431 ns 0.1047 ns 0.1120 ns 0.0029 - - 24 B
PreComputeRecordReader Reflection 562.094 ns 10.2099 ns 9.5503 ns 0.0134 - - 112 B
PreComputeRecordReader Compiled 22.212 ns 0.2661 ns 0.2489 ns 0.0134 - - 112 B
PreComputeUnionConstructor Reflection 473.935 ns 5.1988 ns 4.8629 ns 0.0105 - - 88 B
PreComputeUnionConstructor Compiled 7.737 ns 0.0595 ns 0.0497 ns 0.0048 - - 40 B
PreComputeUnionReader Reflection 318.424 ns 4.2661 ns 3.9905 ns 0.0086 - - 72 B
PreComputeUnionReader Compiled 13.706 ns 0.3290 ns 0.4822 ns 0.0086 - - 72 B
PreComputeUnionTagReader Reflection 373.670 ns 2.4971 ns 2.3358 ns 0.0029 - - 24 B
PreComputeUnionTagReader Compiled 6.567 ns 0.1600 ns 0.2136 ns - - - -

Comment on lines -289 to +377
(fun (obj: obj) ->
let m2b = typ.GetMethod("GetTag", BindingFlags.Static ||| bindingFlags, null, [| typ |], null)
m2b.Invoke(null, [|obj|]) :?> int)
let m2b = typ.GetMethod("GetTag", BindingFlags.Static ||| bindingFlags, null, [| typ |], null)
(fun (obj: obj) -> m2b.Invoke(null, [|obj|]) :?> int)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's probably no need to look up the method on every invocation. I didn't know how to test this specifically though. When is it ever the case that a DU doesn't have a Tag property?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A DU always has a tag as far as I know - struct/ref type, single case/multi case all have tag properties.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, then I'm not sure if this branch executes.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think Paul had PR to do tagless DUs a few (lot of) years ago.

Copy link
Member

@KevinRansom KevinRansom left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good.

Thank you for this

Copy link
Contributor

@cartermp cartermp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was wondering if there was a way to further reduce allocations/CPU time by making the loop not call into an enumerator, but that's clearly a micro-optimization at this point.

@KevinRansom KevinRansom merged commit a4e0947 into dotnet:master Jul 25, 2020
@kerams kerams deleted the reflection-2 branch July 26, 2020 05:38
@kerams
Copy link
Contributor Author

kerams commented Jul 26, 2020

@cartermp, what loop? The only loops I see are those that are used to create expression trees during the precomputation. There are no loops inside the compiled delegates.

@cartermp
Copy link
Contributor

In this case I'm referring to a loop like this: https://github.com/dotnet/fsharp/pull/9784/files#diff-54378f53a8612b2adf50a6efb115ba5aR134

The compiler rewrites this to use an enumerator no matter which kind of for loop you use, see:

current impl
for x = 0 to loop

Rewriting it to a while loop and manually bumping the indexer will remove that like so but it's unlikely to matter much from a perf standpoint. Just kind of weird that trivially changing the for loop had no effect when it usually does.

@kerams
Copy link
Contributor Author

kerams commented Jul 26, 2020

Yeah, I'd be all for altering the loop were this happening inside of the delegate. Don't think it matters at all during prep stage.

@abelbraaksma
Copy link
Contributor

abelbraaksma commented Jul 26, 2020

In hot paths it matters for ints, it's about 4x slower when it becomes an enumeration. That may not matter much here, though. There's an issue open to better optimize this, the example from @cartermp will help there.

@KevinRansom
Copy link
Member

This func is compileUnionCaseConstructorFunc ... it should not happen frequently.

@Daniel-Svensson
Copy link
Contributor

Daniel-Svensson commented Jul 28, 2020

@kerams if you want to further improve performance (by 6-7 (on netcore 3.1) times for the "get all properties" case you can modify the expression to initialize the array in reverse order to avoid array bounds check.

I have a method added to your benchmark from the previous PR and results laying around at home that I can post next week if it is of interest.
But it might be regarded as a micro optimization and not too relevant here since one might expect Expression.InitArray to be as fast as possible

and it might be a flaw in the benchmark code

@kerams
Copy link
Contributor Author

kerams commented Jul 28, 2020

@Daniel-Svensson, are you saying

var a = int[5];
a[0] = 0;
a[1] = 0;
a[2] = 0;
a[3] = 0;
a[4] = 0;

is 6 times slower than

var a = int[5];
a[4] = 0;
a[3] = 0;
a[2] = 0;
a[1] = 0;
a[0] = 0;

??

Apologies for using 'the other sharp'.

@Daniel-Svensson
Copy link
Contributor

@kerams actually I belive I must have a bug in my fsharp (cannot check it this week), it was a quick hack late before bedtime.
If the length is unknown to the jit the reverse order is much faster but thinking more about it i would not expect such large improvements but only double digit percentage. There are many such "hacks" in the runtime

nosami pushed a commit to xamarin/visualfsharp that referenced this pull request Feb 23, 2021
* Compile PreComputeRecordConstructor, PreComputeRecordReader, PreComputeRecordFieldReader

* Compile PreComputeUnionConstructor, PreComputeUnionTagReader, PreComputeUnionReader
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants